Tutorial Brief

We will cover some functions from YouTube Data API v3 from Google Developer Console.

Important Links:

We will use the following function:

  • youtube.search.list Doc
  • youtube.videos.list Doc

Video Tutorial:

https://www.youtube.com/watch?v=bCrCkfSyuNE

Google APIs

There is a Python Google Library. But we will be using HTTP requests to access the API.


In [73]:
api_key = ""

Import Libraries


In [57]:
from __future__ import division
from datetime import datetime 
import requests
from lxml import html, etree
import json
from textblob import TextBlob

import pandas as pd

import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings('ignore')

pd.options.display.max_columns = 100
pd.options.display.max_rows = 35
pd.options.display.width = 120

Searching YouTube Using youtube.search.list

Documentation:

https://developers.google.com/youtube/v3/docs/search

HTTPS Request:

GET https://www.googleapis.com/youtube/v3/search

Parameters:

Parameter name Value Description
Required parameters
part string The part parameter specifies a comma-separated list of one or more search resource properties that the API response will include. Set the parameter value to snippet. The snippet part has a quota cost of 1 unit.
Filters (specify 0 or 1 of the following parameters)
forContentOwner boolean This parameter can only be used in a properly authorized request. Note: This parameter is intended exclusively for YouTube content partners.

The forContentOwner parameter restricts the search to only retrieve resources owned by the content owner specified by the onBehalfOfContentOwner parameter. The user must be authenticated using a CMS account linked to the specified content owner and onBehalfOfContentOwner must be provided.
forMine boolean This parameter can only be used in a properly authorized request. The forMine parameter restricts the search to only retrieve videos owned by the authenticated user. If you set this parameter to true, then the type parameter's value must also be set to video.
relatedToVideoId string The relatedToVideoId parameter retrieves a list of videos that are related to the video that the parameter value identifies. The parameter value must be set to a YouTube video ID and, if you are using this parameter, the type parameter must be set to video.
Optional parameters
channelId string The channelId parameter indicates that the API response should only contain resources created by the channel
channelType string The channelType parameter lets you restrict a search to a particular type of channel.

Acceptable values are:
  • any – Return all channels.
  • show – Only retrieve shows.
eventType string The eventType parameter restricts a search to broadcast events. If you specify a value for this parameter, you must also set the type parameter's value to video.

Acceptable values are:
  • completed – Only include completed broadcasts.
  • live – Only include active broadcasts.
  • upcoming – Only include upcoming broadcasts.
location string The location parameter, in conjunction with the locationRadius parameter, defines a circular geographic area and also restricts a search to videos that specify, in their metadata, a geographic location that falls within that area. The parameter value is a string that specifies latitude/longitude coordinates e.g. (37.42307,-122.08427).

  • The location parameter value identifies the point at the center of the area.
  • The locationRadius parameter specifies the maximum distance that the location associated with a video can be from that point for the video to still be included in the search results.
The API returns an error if your request specifies a value for the location parameter but does not also specify a value for the locationRadius parameter.
locationRadius string The locationRadius parameter, in conjunction with the location parameter, defines a circular geographic area.

The parameter value must be a floating point number followed by a measurement unit. Valid measurement units are m, km, ft, and mi. For example, valid parameter values include 1500m, 5km, 10000ft, and 0.75mi. The API does not support locationRadius parameter values larger than 1000 kilometers.

Note: See the definition of the location parameter for more information.
maxResults unsigned integer The maxResults parameter specifies the maximum number of items that should be returned in the result set. Acceptable values are 0 to 50, inclusive. The default value is 5.
onBehalfOfContentOwner string This parameter can only be used in a properly authorized request. Note: This parameter is intended exclusively for YouTube content partners.

The onBehalfOfContentOwner parameter indicates that the request's authorization credentials identify a YouTube CMS user who is acting on behalf of the content owner specified in the parameter value. This parameter is intended for YouTube content partners that own and manage many different YouTube channels. It allows content owners to authenticate once and get access to all their video and channel data, without having to provide authentication credentials for each individual channel. The CMS account that the user authenticates with must be linked to the specified YouTube content owner.
order string The order parameter specifies the method that will be used to order resources in the API response. The default value is relevance.

Acceptable values are:
  • date – Resources are sorted in reverse chronological order based on the date they were created.
  • rating – Resources are sorted from highest to lowest rating.
  • relevance – Resources are sorted based on their relevance to the search query. This is the default value for this parameter.
  • title – Resources are sorted alphabetically by title.
  • videoCount – Channels are sorted in descending order of their number of uploaded videos.
  • viewCount – Resources are sorted from highest to lowest number of views.
pageToken string The pageToken parameter identifies a specific page in the result set that should be returned. In an API response, the nextPageToken and prevPageToken properties identify other pages that could be retrieved.
publishedAfter datetime The publishedAfter parameter indicates that the API response should only contain resources created after the specified time. The value is an RFC 3339 formatted date-time value (1970-01-01T00:00:00Z).
publishedBefore datetime The publishedBefore parameter indicates that the API response should only contain resources created before the specified time. The value is an RFC 3339 formatted date-time value (1970-01-01T00:00:00Z).
q string The q parameter specifies the query term to search for.

Your request can also use the Boolean NOT (-) and OR (|) operators to exclude videos or to find videos that are associated with one of several search terms. For example, to search for videos matching either "boating" or "sailing", set the q parameter value to boating|sailing. Similarly, to search for videos matching either "boating" or "sailing" but not "fishing", set the q parameter value to boating|sailing -fishing. Note that the pipe character must be URL-escaped when it is sent in your API request. The URL-escaped value for the pipe character is %7C.
regionCode string The regionCode parameter instructs the API to return search results for the specified country. The parameter value is an ISO 3166-1 alpha-2 country code.
safeSearch string The safeSearch parameter indicates whether the search results should include restricted content as well as standard content.

Acceptable values are:
  • moderate – YouTube will filter some content from search results and, at the least, will filter content that is restricted in your locale. Based on their content, search results could be removed from search results or demoted in search results. This is the default parameter value.
  • none – YouTube will not filter the search result set.
  • strict – YouTube will try to exclude all restricted content from the search result set. Based on their content, search results could be removed from search results or demoted in search results.
topicId string The topicId parameter indicates that the API response should only contain resources associated with the specified topic. The value identifies a Freebase topic ID.
type string The type parameter restricts a search query to only retrieve a particular type of resource. The value is a comma-separated list of resource types. The default value is video,channel,playlist.

Acceptable values are:
  • channel
  • playlist
  • video
videoCaption string The videoCaption parameter indicates whether the API should filter video search results based on whether they have captions. If you specify a value for this parameter, you must also set the type parameter's value to video.

Acceptable values are:
  • any – Do not filter results based on caption availability.
  • closedCaption – Only include videos that have captions.
  • none – Only include videos that do not have captions.
videoCategoryId string The videoCategoryId parameter filters video search results based on their category. If you specify a value for this parameter, you must also set the type parameter's value to video.
videoDefinition string The videoDefinition parameter lets you restrict a search to only include either high definition (HD) or standard definition (SD) videos. HD videos are available for playback in at least 720p, though higher resolutions, like 1080p, might also be available. If you specify a value for this parameter, you must also set the type parameter's value to video.

Acceptable values are:
  • any – Return all videos, regardless of their resolution.
  • high – Only retrieve HD videos.
  • standard – Only retrieve videos in standard definition.
videoDimension string The videoDimension parameter lets you restrict a search to only retrieve 2D or 3D videos. If you specify a value for this parameter, you must also set the type parameter's value to video.

Acceptable values are:
  • 2d – Restrict search results to exclude 3D videos.
  • 3d – Restrict search results to only include 3D videos.
  • any – Include both 3D and non-3D videos in returned results. This is the default value.
videoDuration string The videoDuration parameter filters video search results based on their duration. If you specify a value for this parameter, you must also set the type parameter's value to video.

Acceptable values are:
  • any – Do not filter video search results based on their duration. This is the default value.
  • long – Only include videos longer than 20 minutes.
  • medium – Only include videos that are between four and 20 minutes long (inclusive).
  • short – Only include videos that are less than four minutes long.
videoEmbeddable string The videoEmbeddable parameter lets you to restrict a search to only videos that can be embedded into a webpage. If you specify a value for this parameter, you must also set the type parameter's value to video.

Acceptable values are:
  • any – Return all videos, embeddable or not.
  • true – Only retrieve embeddable videos.
videoLicense string The videoLicense parameter filters search results to only include videos with a particular license. YouTube lets video uploaders choose to attach either the Creative Commons license or the standard YouTube license to each of their videos. If you specify a value for this parameter, you must also set the type parameter's value to video.

Acceptable values are:
  • any – Return all videos, regardless of which license they have, that match the query parameters.
  • creativeCommon – Only return videos that have a Creative Commons license. Users can reuse videos with this license in other videos that they create. Learn more.
  • youtube – Only return videos that have the standard YouTube license.
videoSyndicated string The videoSyndicated parameter lets you to restrict a search to only videos that can be played outside youtube.com. If you specify a value for this parameter, you must also set the type parameter's value to video.

Acceptable values are:
  • any – Return all videos, syndicated or not.
  • true – Only retrieve syndicated videos.
videoType string The videoType parameter lets you restrict a search to a particular type of videos. If you specify a value for this parameter, you must also set the type parameter's value to video.

Acceptable values are:
  • any – Return all videos.
  • episode – Only retrieve episodes of shows.
  • movie – Only retrieve movies.

The important parameters:

  • part:

    • id: Returns only resource ID data
    • snippet: Returns some basic meta data about the resource
  • channelId:

    • Filter results to a single channelId.
  • maxResults:

    • Between 0 and 50 results per page. The default is 5.
  • order:

    • date: Resources are sorted in reverse chronological order based on the date they were uploaded.
    • rating: Resources are sorted from highest to lowest rating.
    • relevance: Resources are sorted based on their relevance to the search query. This is the default value for this parameter.
    • title: Resources are sorted alphabetically by title.
    • videoCount: Channels are sorted in descending order of their number of uploaded videos.
    • viewCount: Resources are sorted from highest to lowest number of views.
  • pageToken:

    • A string token to select results page
  • publishedAfter:

    • Use RFC 3339 format for Date Time 2000-12-31T23:59:59
  • publishedBefore:

    • Use RFC 3339 format for Date Time 2000-12-31T23:59:59
  • q:

    • Query term(s)
    • You can use multiple search terms
    • For OR operator use |
    • For NOT operator use -
  • key:

    • You API Key code

Preparing The HTTP Request


In [3]:
parameters = {"part": "snippet",
              "maxResults": 5,
              "order": "date",
              "pageToken": "",
              "publishedAfter": "2008-08-04T00:00:00Z",
              "publishedBefore": "2008-11-04T00:00:00Z",
              "q": "",
              "key": api_key,
              "type": "video",
              }
url = "https://www.googleapis.com/youtube/v3/search"

Fetch Results for a Single Page


In [4]:
parameters["q"] = "Mark Udall"
page = requests.request(method="get", url=url, params=parameters)
j_results = json.loads(page.text)
print page.text


{
 "kind": "youtube#searchListResponse",
 "etag": "\"PSjn-HSKiX6orvNhGZvglLI2lvk/_2hFMhP6zvFl7CAy5D9Ir40dMWE\"",
 "nextPageToken": "CAUQAA",
 "pageInfo": {
  "totalResults": 2325,
  "resultsPerPage": 5
 },
 "items": [
  {
   "kind": "youtube#searchResult",
   "etag": "\"PSjn-HSKiX6orvNhGZvglLI2lvk/tmAMwya2pvXlrX05odd04vzKBSQ\"",
   "id": {
    "kind": "youtube#video",
    "videoId": "5Q98TvXjIZg"
   },
   "snippet": {
    "publishedAt": "2008-11-03T15:31:30.000Z",
    "channelId": "UC52X5wxOL_s5yw0dQk7NtgA",
    "title": "Cousins Vying to Ride Democratic Wave to Senate",
    "description": "Cousins Tom and Mark Udall are vying to become U.S. Senators in New Mexico and Colorado. The two are hoping to ride an emerging Democratic wave in the ...",
    "thumbnails": {
     "default": {
      "url": "https://i.ytimg.com/vi/5Q98TvXjIZg/default.jpg"
     },
     "medium": {
      "url": "https://i.ytimg.com/vi/5Q98TvXjIZg/mqdefault.jpg"
     },
     "high": {
      "url": "https://i.ytimg.com/vi/5Q98TvXjIZg/hqdefault.jpg"
     }
    },
    "channelTitle": "AssociatedPress",
    "liveBroadcastContent": "none"
   }
  },
  {
   "kind": "youtube#searchResult",
   "etag": "\"PSjn-HSKiX6orvNhGZvglLI2lvk/Pqtrk7f6rZM5jwPLKtXoJ98nNtg\"",
   "id": {
    "kind": "youtube#video",
    "videoId": "nnghUTeSKW0"
   },
   "snippet": {
    "publishedAt": "2008-11-03T00:06:40.000Z",
    "channelId": "UC9ZGcEDoHfuY8lB5_SknuLA",
    "title": "mark udall",
    "description": "gov project.",
    "thumbnails": {
     "default": {
      "url": "https://i.ytimg.com/vi/nnghUTeSKW0/default.jpg"
     },
     "medium": {
      "url": "https://i.ytimg.com/vi/nnghUTeSKW0/mqdefault.jpg"
     },
     "high": {
      "url": "https://i.ytimg.com/vi/nnghUTeSKW0/hqdefault.jpg"
     }
    },
    "channelTitle": "g072091",
    "liveBroadcastContent": "none"
   }
  },
  {
   "kind": "youtube#searchResult",
   "etag": "\"PSjn-HSKiX6orvNhGZvglLI2lvk/ay0GP1CevugYOb4FvBtJzXG_A0c\"",
   "id": {
    "kind": "youtube#video",
    "videoId": "Pq-KnAMpDHs"
   },
   "snippet": {
    "publishedAt": "2008-11-01T00:55:26.000Z",
    "channelId": "UC5QhjJAjxtRvJ9ujFxNiJbA",
    "title": "Eden Lane One on One with Congressman Mark Udall",
    "description": "Senate candidate, Congressman Mark Udall spoke with me at a campaign event. Congressional candidate Jared Polis, and State Senate Candidate Joe ...",
    "thumbnails": {
     "default": {
      "url": "https://i.ytimg.com/vi/Pq-KnAMpDHs/default.jpg"
     },
     "medium": {
      "url": "https://i.ytimg.com/vi/Pq-KnAMpDHs/mqdefault.jpg"
     },
     "high": {
      "url": "https://i.ytimg.com/vi/Pq-KnAMpDHs/hqdefault.jpg"
     }
    },
    "channelTitle": "missedenlane",
    "liveBroadcastContent": "none"
   }
  },
  {
   "kind": "youtube#searchResult",
   "etag": "\"PSjn-HSKiX6orvNhGZvglLI2lvk/y4ELzTNng5HKENL9FoHgkDV304k\"",
   "id": {
    "kind": "youtube#video",
    "videoId": "aITDlrkKOoY"
   },
   "snippet": {
    "publishedAt": "2008-11-01T00:42:17.000Z",
    "channelId": "UCT3P1V7_N5HzV1vEZUSdNNQ",
    "title": "CO: AFGE, APWU, NALC, and NPMHU leaflet with Mark Udall",
    "description": "APWU, NALC, and NPMHU are out at the worksite when it matters most!",
    "thumbnails": {
     "default": {
      "url": "https://i.ytimg.com/vi/aITDlrkKOoY/default.jpg"
     },
     "medium": {
      "url": "https://i.ytimg.com/vi/aITDlrkKOoY/mqdefault.jpg"
     },
     "high": {
      "url": "https://i.ytimg.com/vi/aITDlrkKOoY/hqdefault.jpg"
     }
    },
    "channelTitle": "shubi10",
    "liveBroadcastContent": "none"
   }
  },
  {
   "kind": "youtube#searchResult",
   "etag": "\"PSjn-HSKiX6orvNhGZvglLI2lvk/JgS_14GwklWRyGGiyMW8NbsyiQA\"",
   "id": {
    "kind": "youtube#video",
    "videoId": "JAHI1pSiEPM"
   },
   "snippet": {
    "publishedAt": "2008-10-30T04:43:12.000Z",
    "channelId": "UCxdp8upAlGFfB4jjTH3wAHw",
    "title": "[SEN-CO] Udall: Reason",
    "description": "http://politicalrealm.blogspot.com A new campaign ad from Democrat Mark Udall.",
    "thumbnails": {
     "default": {
      "url": "https://i.ytimg.com/vi/JAHI1pSiEPM/default.jpg"
     },
     "medium": {
      "url": "https://i.ytimg.com/vi/JAHI1pSiEPM/mqdefault.jpg"
     },
     "high": {
      "url": "https://i.ytimg.com/vi/JAHI1pSiEPM/hqdefault.jpg"
     }
    },
    "channelTitle": "PoliticalRealm",
    "liveBroadcastContent": "none"
   }
  }
 ]
}

YouTube Video Meta Data Using youtube.video.list

Documentation:

https://developers.google.com/youtube/v3/docs/videos/list

HTTPS Request:

GET https://www.googleapis.com/youtube/v3/videos

Parameters:

Parameter name Value Description
Required parameters
part string The part parameter specifies a comma-separated list of one or more video resource properties that the API response will include.

If the parameter identifies a property that contains child properties, the child properties will be included in the response. For example, in a video resource, the snippet property contains the channelId, title, description, tags, and categoryId properties. As such, if you set part=snippet, the API response will contain all of those properties.

The list below contains the part names that you can include in the parameter value and the quota cost for each part:
  • contentDetails: 2
  • fileDetails: 1
  • id: 0
  • liveStreamingDetails: 2
  • player: 0
  • processingDetails: 1
  • recordingDetails: 2
  • snippet: 2
  • statistics: 2
  • status: 2
  • suggestions: 1
  • topicDetails: 2
Filters (specify exactly one of the following parameters)
chart string The chart parameter identifies the chart that you want to retrieve.

Acceptable values are:
id string The id parameter specifies a comma-separated list of the YouTube video ID(s) for the resource(s) that are being retrieved. In a video resource, the id property specifies the video's ID.
myRating string This parameter can only be used in a properly authorized request. Set this parameter's value to like or dislike to instruct the API to only return videos liked or disliked by the authenticated user.

Acceptable values are:
  • dislike – Returns only videos disliked by the authenticated user.
  • like – Returns only video liked by the authenticated user.
Optional parameters
maxResults unsigned integer The maxResults parameter specifies the maximum number of items that should be returned in the result set.

Note: This parameter is supported for use in conjunction with the myRating parameter, but it is not supported for use in conjunction with the id parameter. Acceptable values are 1 to 50, inclusive. The default value is 5.
onBehalfOfContentOwner string This parameter can only be used in a properly authorized request. Note: This parameter is intended exclusively for YouTube content partners.

The onBehalfOfContentOwner parameter indicates that the request's authorization credentials identify a YouTube CMS user who is acting on behalf of the content owner specified in the parameter value. This parameter is intended for YouTube content partners that own and manage many different YouTube channels. It allows content owners to authenticate once and get access to all their video and channel data, without having to provide authentication credentials for each individual channel. The CMS account that the user authenticates with must be linked to the specified YouTube content owner.
pageToken string The pageToken parameter identifies a specific page in the result set that should be returned. In an API response, the nextPageToken and prevPageToken properties identify other pages that could be retrieved.

Note: This parameter is supported for use in conjunction with the myRating parameter, but it is not supported for use in conjunction with the id parameter.
regionCode string The regionCode parameter instructs the API to select a video chart available in the specified region. This parameter can only be used in conjunction with the chart parameter. The parameter value is an ISO 3166-1 alpha-2 country code.
videoCategoryId string The videoCategoryId parameter identifies the video category for which the chart should be retrieved. This parameter can only be used in conjunction with the chart parameter. By default, charts are not restricted to a particular category. The default value is 0.

Preparing The HTTP Request


In [5]:
parameters = {"part": "statistics",
              "id": "5Q98TvXjIZg",
              "key": api_key,
              }
url = "https://www.googleapis.com/youtube/v3/videos"

In [6]:
page = requests.request(method="get", url=url, params=parameters)
j_results = json.loads(page.text)
print page.text


{
 "kind": "youtube#videoListResponse",
 "etag": "\"PSjn-HSKiX6orvNhGZvglLI2lvk/RI2HLqoe4gS1QbNV867B5089lmY\"",
 "pageInfo": {
  "totalResults": 1,
  "resultsPerPage": 1
 },
 "items": [
  {
   "kind": "youtube#video",
   "etag": "\"PSjn-HSKiX6orvNhGZvglLI2lvk/HdPfiQBFpxUe-eEq-EYCkg3p4b8\"",
   "id": "5Q98TvXjIZg",
   "statistics": {
    "viewCount": "58",
    "likeCount": "0",
    "dislikeCount": "0",
    "favoriteCount": "0",
    "commentCount": "0"
   }
  }
 ]
}

Process Data Range

I'll check the coorelation between the results of 2008 Senate elections results and YouTube Stats.

Colorado Senate - Gardner vs. Udall Cory Gardner (R) Mark Udall (D)


In [7]:
def _search_list(q="", publishedAfter=None, publishedBefore=None, pageToken=""):
    parameters = {"part": "id",
                  "maxResults": 50,
                  "order": "viewCount",
                  "pageToken": pageToken,
                  "q": q,
                  "type": "video",
                  "key": api_key,
                  }
    url = "https://www.googleapis.com/youtube/v3/search"
    
    if publishedAfter: parameters["publishedAfter"] = publishedAfter
    if publishedBefore: parameters["publishedBefore"] = publishedBefore
    
    page = requests.request(method="get", url=url, params=parameters)
    return json.loads(page.text)

def search_list(q="", publishedAfter=None, publishedBefore=None, max_requests=10):
    more_results = True
    pageToken=""
    results = []
    
    for counter in range(max_requests):
        j_results = _search_list(q=q, publishedAfter=publishedAfter, publishedBefore=publishedBefore, pageToken=pageToken)
        items = j_results.get("items", None)
        if items:
            results += [item["id"]["videoId"] for item in j_results["items"]]
            if j_results.has_key("nextPageToken"):
                pageToken = j_results["nextPageToken"]
            else:
                return results
        else:
            return results
    return results

def _video_list(video_id_list):
    parameters = {"part": "statistics",
                  "id": ",".join(video_id_list),
                  "key": api_key,
                  "maxResults": 50
                  }
    url = "https://www.googleapis.com/youtube/v3/videos"
    page = requests.request(method="get", url=url, params=parameters)
    j_results = json.loads(page.text)
    df = pd.DataFrame([item["statistics"] for item in j_results["items"]], dtype=np.int64)
    df["video_id"] = [item["id"] for item in j_results["items"]]
    
    parameters["part"] = "snippet"
    page = requests.request(method="get", url=url, params=parameters)
    j_results = json.loads(page.text)
    df["publishedAt"] = [item["snippet"]["publishedAt"] for item in j_results["items"]]
    df["publishedAt"] = df["publishedAt"].apply(lambda x: datetime.strptime(x, "%Y-%m-%dT%H:%M:%S.000Z"))
    df["date"] = df["publishedAt"].apply(lambda x: x.date())
    df["week"] = df["date"].apply(lambda x: x.isocalendar()[1])
    df["channelId"] = [item["snippet"]["channelId"] for item in j_results["items"]]
    df["title"] = [item["snippet"]["title"] for item in j_results["items"]]
    df["description"] = [item["snippet"]["description"] for item in j_results["items"]]
    df["channelTitle"] = [item["snippet"]["channelTitle"] for item in j_results["items"]]
    df["categoryId"] = [item["snippet"]["categoryId"] for item in j_results["items"]]
    return df

def video_list(video_id_list):
    values = []
    for index, item in enumerate(video_id_list[::50]):
        t_index = index * 50
        values.append(_video_list(video_id_list[t_index:t_index+50]))
    return pd.concat(values)

Get Data for Two Candidates


In [8]:
def get_data(candidates, publishedAfter, publishedBefore):
    results_list = []
    for q in candidates:
        results = search_list(q=q,
                              publishedAfter=publishedAfter,
                              publishedBefore=publishedBefore,
                              max_requests=50)

        stat_data_set = video_list(results)
        stat_data_set["candidate_name"] = q
        results_list.append(stat_data_set)
    data_set = pd.concat(results_list)
    return data_set

def get_2008_data(candidates):
    return get_data(candidates, publishedAfter="2008-08-04T00:00:00Z", publishedBefore="2008-11-04T00:00:00Z")

def get_2010_data(candidates):
    return get_data(candidates, publishedAfter="2010-08-04T00:00:00Z", publishedBefore="2010-11-04T00:00:00Z")

def get_2012_data(candidates):
    return get_data(candidates, publishedAfter="2012-08-04T00:00:00Z", publishedBefore="2012-11-04T00:00:00Z")

def get_2014_data(candidates):
    return get_data(candidates, publishedAfter="2014-08-04T00:00:00Z", publishedBefore="2014-11-04T00:00:00Z")

Analyzing Colorado Senate Race for 2014


In [9]:
candidates = ["Cory Gardner", "Mark Udall"] # Cory Gardner (R), Mark Udall (D)*
colorado_2014_ds = get_2014_data(candidates)
pd.pivot_table(colorado_2014_ds, values=["commentCount", "favoriteCount", "dislikeCount", "likeCount", "viewCount"],
               aggfunc='sum', rows="candidate_name")


Out[9]:
commentCount dislikeCount favoriteCount likeCount viewCount
candidate_name
Cory Gardner 304 167 0 437 234669
Mark Udall 195 470 0 450 144744

In [10]:
for candidate, color in zip(candidates, ["r", "b"]):
    cand = colorado_2014_ds[colorado_2014_ds["candidate_name"]==candidate]
    by_date = cand["week"].value_counts()
    by_date = by_date.sort_index()
    dates = by_date.index
    plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos Published")
plt.xlabel("Week")
plt.show()



In [11]:
for candidate, color in zip(candidates, ["r", "b"]):
    cand = colorado_2014_ds[colorado_2014_ds["candidate_name"]==candidate]
    by_date = pd.pivot_table(cand, rows=["week"], values=["viewCount"], aggfunc="sum")
    by_date = by_date.sort_index()
    dates = by_date.index
    plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos viewCount")
plt.xlabel("Week")
plt.show()



In [12]:
for candidate, color in zip(candidates, ["r", "b"]):
    cand = colorado_2014_ds[colorado_2014_ds["candidate_name"]==candidate]
    by_date = pd.pivot_table(cand, rows=["week"], values=["likeCount"], aggfunc="sum")
    by_date = by_date.sort_index()
    dates = by_date.index
    plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos likeCount")
plt.xlabel("Week")
plt.show()



In [13]:
for candidate, color in zip(candidates, ["r", "b"]):
    cand = colorado_2014_ds[colorado_2014_ds["candidate_name"]==candidate]
    by_date = pd.pivot_table(cand, rows=["week"], values=["dislikeCount"], aggfunc="sum")
    by_date = by_date.sort_index()
    dates = by_date.index
    plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos dislikeCount")
plt.xlabel("Week")
plt.show()


How Predective Was It in 2012?

Virginia Senate - Allen vs. Kaine


In [14]:
candidates = ["George Allen", "Tim Kaine"] # George Allen (R), Tim Kaine (D)Winner
va_2012_ds = get_2012_data(candidates)
pd.pivot_table(va_2012_ds, values=["commentCount", "favoriteCount", "dislikeCount", "likeCount", "viewCount"],
               aggfunc='sum', rows="candidate_name")


Out[14]:
commentCount dislikeCount favoriteCount likeCount viewCount
candidate_name
George Allen 297 352 0 475 203297
Tim Kaine 174 97 0 553 248367

In [15]:
for candidate, color in zip(candidates, ["r", "b"]):
    cand = va_2012_ds[va_2012_ds["candidate_name"]==candidate]
    by_date = cand["week"].value_counts()
    by_date = by_date.sort_index()
    dates = by_date.index
    plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos Published")
plt.xlabel("Week")
plt.show()



In [16]:
for candidate, color in zip(candidates, ["r", "b"]):
    cand = va_2012_ds[va_2012_ds["candidate_name"]==candidate]
    by_date = pd.pivot_table(cand, rows=["week"], values=["viewCount"], aggfunc="sum")
    by_date = by_date.sort_index()
    dates = by_date.index
    plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos viewCount")
plt.xlabel("Week")
plt.show()



In [17]:
for candidate, color in zip(candidates, ["r", "b"]):
    cand = va_2012_ds[va_2012_ds["candidate_name"]==candidate]
    by_date = pd.pivot_table(cand, rows=["week"], values=["likeCount"], aggfunc="sum")
    by_date = by_date.sort_index()
    dates = by_date.index
    plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos likeCount")
plt.xlabel("Week")
plt.show()



In [18]:
for candidate, color in zip(candidates, ["r", "b"]):
    cand = va_2012_ds[va_2012_ds["candidate_name"]==candidate]
    by_date = pd.pivot_table(cand, rows=["week"], values=["dislikeCount"], aggfunc="sum")
    by_date = by_date.sort_index()
    dates = by_date.index
    plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos dislikeCount")
plt.xlabel("Week")
plt.show()


Nevada Senate - Heller vs. Berkley


In [19]:
candidates = ["Dean Heller", "Shelley Berkley"] # Dean Heller (R)*Winnner, Shelley Berkley (D)
nv_2012_ds = get_2012_data(candidates)
print pd.pivot_table(nv_2012_ds, values=["commentCount", "favoriteCount", "dislikeCount", "likeCount", "viewCount"],
               aggfunc='sum', rows="candidate_name")

for candidate, color in zip(candidates, ["r", "b"]):
    cand = nv_2012_ds[nv_2012_ds["candidate_name"]==candidate]
    by_date = cand["week"].value_counts()
    by_date = by_date.sort_index()
    dates = by_date.index
    plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos Published")
plt.xlabel("Week")
plt.show()

for candidate, color in zip(candidates, ["r", "b"]):
    cand = nv_2012_ds[nv_2012_ds["candidate_name"]==candidate]
    by_date = pd.pivot_table(cand, rows=["week"], values=["viewCount"], aggfunc="sum")
    by_date = by_date.sort_index()
    dates = by_date.index
    plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos viewCount")
plt.xlabel("Week")
plt.show()

for candidate, color in zip(candidates, ["r", "b"]):
    cand = nv_2012_ds[nv_2012_ds["candidate_name"]==candidate]
    by_date = pd.pivot_table(cand, rows=["week"], values=["likeCount"], aggfunc="sum")
    by_date = by_date.sort_index()
    dates = by_date.index
    plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos likeCount")
plt.xlabel("Week")
plt.show()

for candidate, color in zip(candidates, ["r", "b"]):
    cand = nv_2012_ds[nv_2012_ds["candidate_name"]==candidate]
    by_date = pd.pivot_table(cand, rows=["week"], values=["dislikeCount"], aggfunc="sum")
    by_date = by_date.sort_index()
    dates = by_date.index
    plt.plot(dates, by_date.values, "-o", label=candidate, c=color, linewidth=2)
plt.legend(loc="best")
plt.ylabel("Videos dislikeCount")
plt.xlabel("Week")
plt.show()


                 commentCount  dislikeCount  favoriteCount  likeCount  viewCount
candidate_name                                                                  
Dean Heller               248           644              0        926     870677
Shelley Berkley           222           206              0        472     679636

Current Senate $113^{th}$

Get Current Senate Data


In [20]:
url = "http://www.senate.gov/general/contact_information/senators_cfm.xml"
response = requests.get(url)
tree = etree.fromstring(str(response.text))
print tree


<Element contact_information at 0x7ff0804d4170>

Store Data In Pandas Data Frame


In [21]:
member_full = [member.xpath("member_full")[0].text for member in tree.xpath("//member")]
senators = pd.DataFrame(member_full, columns=["member_full"])

senators["member_full"] = member_full
senators["last_name"] = [member.xpath("last_name")[0].text for member in tree.xpath("//member")]
senators["first_name"] = [member.xpath("first_name")[0].text for member in tree.xpath("//member")]
senators["party"] = [member.xpath("party")[0].text for member in tree.xpath("//member")]
senators["state"] = [member.xpath("state")[0].text for member in tree.xpath("//member")]
senators["address"] = [member.xpath("address")[0].text for member in tree.xpath("//member")]
senators["phone"] = [member.xpath("phone")[0].text for member in tree.xpath("//member")]
senators["website"] = [member.xpath("website")[0].text for member in tree.xpath("//member")]
senators["bioguide_id"] = [member.xpath("bioguide_id")[0].text for member in tree.xpath("//member")]
senators["class"] = [member.xpath("class")[0].text for member in tree.xpath("//member")]

senators


Out[21]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 100 entries, 0 to 99
Data columns (total 10 columns):
member_full    100  non-null values
last_name      100  non-null values
first_name     100  non-null values
party          100  non-null values
state          100  non-null values
address        100  non-null values
phone          100  non-null values
website        100  non-null values
bioguide_id    100  non-null values
class          100  non-null values
dtypes: object(10)

Control By Party


In [22]:
by_party = senators["party"].value_counts()
by_party.sort(ascending=False)
print by_party

color_dict = {"D": "b",
              "R": "r",
              "I": "g"}


labels = ["%s: %s" % (by_party.index[index], value) for index, value in enumerate(by_party)]
colors = list(pd.Series(by_party.index).map(color_dict))

plt.figure()
plt.axis("equal")
plt.pie(by_party.values, labels=labels, colors=colors, shadow=True, explode=np.zeros(len(by_party)) + 0.04)
plt.show()


fig = plt.figure()
axes = fig.add_subplot(111)
axes.barh(range(len(by_party.index)), by_party.values, color=colors)
plt.box(on="off")
axes.axvline(x=50, color="black", alpha=0.7, linewidth=2)
axes.yaxis.set_ticks([item + 0.4 for item in range(len(by_party.index))])
axes.yaxis.set_ticklabels(by_party.index, minor=False)
plt.xlabel("$113^{th}$ Senate Seats Controlled by Party")
plt.show()


D    53
R    45
I     2
dtype: int64

Who is up for Re-election?

Class II senators are up for re-election.


In [23]:
class_2_senators = senators[senators["class"]=="Class II"]
by_party =class_2_senators["party"].value_counts()
by_party.sort(ascending=False)
print by_party

labels = ["%s: %s" % (by_party.index[index], value) for index, value in enumerate(by_party)]
colors = list(pd.Series(by_party.index).map(color_dict))

plt.figure()
plt.axis("equal")
plt.pie(by_party.values, labels=labels, colors=colors, shadow=True, explode=np.zeros(len(by_party)) + 0.04)
plt.show()

color_dict = {"D": "b",
              "R": "r",
              "I": "g"}

fig = plt.figure()
axes = fig.add_subplot(111)
axes.barh(range(len(by_party.index)), by_party.values, color=colors)
plt.box(on="off")
axes.yaxis.set_ticks([item + 0.4 for item in range(len(by_party.index))])
axes.yaxis.set_ticklabels(by_party.index, minor=False)
plt.xlabel("$113^{th}$ Senate Seats of $Class II$ Controlled by Party")
plt.show()


D    20
R    13
dtype: int64

Looking at the other classes


In [24]:
class_3_senators = senators[senators["class"]=="Class III"]
by_party =class_3_senators["party"].value_counts()
by_party.sort(ascending=False)
print by_party

labels = ["%s: %s" % (by_party.index[index], value) for index, value in enumerate(by_party)]
colors = list(pd.Series(by_party.index).map(color_dict))

plt.figure()
plt.axis("equal")
plt.pie(by_party.values, labels=labels, colors=colors, shadow=True, explode=np.zeros(len(by_party)) + 0.04)
plt.show()

color_dict = {"D": "b",
              "R": "r",
              "I": "g"}

fig = plt.figure()
axes = fig.add_subplot(111)
axes.barh(range(len(by_party.index)), by_party.values, color=colors)
plt.box(on="off")
axes.yaxis.set_ticks([item + 0.4 for item in range(len(by_party.index))])
axes.yaxis.set_ticklabels(by_party.index, minor=False)
plt.xlabel("$113^{th}$ Senate Seats of $Class III$ Controlled by Party")
plt.show()


R    24
D    10
dtype: int64

In [25]:
class_1_senators = senators[senators["class"]=="Class I"]
by_party =class_1_senators["party"].value_counts()
by_party.sort(ascending=False)
print by_party

labels = ["%s: %s" % (by_party.index[index], value) for index, value in enumerate(by_party)]
colors = list(pd.Series(by_party.index).map(color_dict))

plt.figure()
plt.axis("equal")
plt.pie(by_party.values, labels=labels, colors=colors, shadow=True, explode=np.zeros(len(by_party)) + 0.04)
plt.show()

color_dict = {"D": "b",
              "R": "r",
              "I": "g"}

fig = plt.figure()
axes = fig.add_subplot(111)
axes.barh(range(len(by_party.index)), by_party.values, color=colors)
plt.box(on="off")
axes.yaxis.set_ticks([item + 0.4 for item in range(len(by_party.index))])
axes.yaxis.set_ticklabels(by_party.index, minor=False)
plt.xlabel("$113^{th}$ Senate Seats of $Class I$ Controlled by Party")
plt.show()


D    23
R     8
I     2
dtype: int64

Forecasting Results of Senate Elections 2014

Start with listing all seat in $Class II$


In [26]:
class_2_senators = senators[senators["class"]=="Class II"].sort("state")
class_2_senators


Out[26]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 33 entries, 4 to 29
Data columns (total 10 columns):
member_full    33  non-null values
last_name      33  non-null values
first_name     33  non-null values
party          33  non-null values
state          33  non-null values
address        33  non-null values
phone          33  non-null values
website        33  non-null values
bioguide_id    33  non-null values
class          33  non-null values
dtypes: object(10)

Get Competitors

Fetch Data


In [63]:
url = "http://www.fec.gov/data/CandidateSummary.do?format=xml"
response = requests.get(url)
page = html.fromstring(str(response.text))
print response.text[:1000]


<data.fec.gov xmlns:fecdc="http://www.w3.org/2001/XMLSchema-instance" fecdc:schemaLocation="/data /finance/disclosure/schema/CandidateSummary.xsd"><title>Candidate Summary</title><description>This file contains information for each candidate who has registered with the FEC or appears on an official state ballot for an election to the U.S. House of Representatives, U.S. Senate or U.S. President. The table is available for the current election cycle and for election cycles through 2008.</description><timestamp>2014-10-09T05:06:27-05:00</timestamp><copyright>Copyright 2014, Federal Election Commission.</copyright><can_sum><lin_ima>http://www.fec.gov/fecviewer/CandidateCommitteeDetail.do?candidateCommitteeId=H4UT04052&amp;tabIndex=1</lin_ima><can_id>H4UT04052</can_id><can_nam>AALDERS, TIM</can_nam><can_off>H</can_off><can_off_sta>UT</can_off_sta><can_off_dis>04</can_off_dis><can_par_aff>IAP</can_par_aff><can_inc_cha_ope_sea>OPEN</can_inc_cha_ope_sea><can_str1>5306 WEST 10320 NORTH</can_str

Process the data into an XML Tree


In [64]:
for item in page[:10]:
    print item.tag


title
description
timestamp
copyright
can_sum
can_sum
can_sum
can_sum
can_sum
can_sum

Notice <can_sum> encapsulates the candidates data.


In [65]:
for item in page.xpath("//can_sum")[0]:
    print "<%s>%s</%s>" % (item.tag, str(item.text), item.tag)


<lin_ima>http://www.fec.gov/fecviewer/CandidateCommitteeDetail.do?candidateCommitteeId=H4UT04052&tabIndex=1</lin_ima>
<can_id>H4UT04052</can_id>
<can_nam>AALDERS, TIM</can_nam>
<can_off>H</can_off>
<can_off_sta>UT</can_off_sta>
<can_off_dis>04</can_off_dis>
<can_par_aff>IAP</can_par_aff>
<can_inc_cha_ope_sea>OPEN</can_inc_cha_ope_sea>
<can_str1>5306 WEST 10320 NORTH</can_str1>
<can_str2>None</can_str2>
<can_cit>HIGHLAND</can_cit>
<can_sta>UT</can_sta>
<can_zip>84003</can_zip>
<ind_ite_con>None</ind_ite_con>
<ind_uni_con>None</ind_uni_con>
<ind_con>None</ind_con>
<par_com_con>None</par_com_con>
<oth_com_con>None</oth_com_con>
<can_con>None</can_con>
<tot_con>None</tot_con>
<tra_fro_oth_aut_com>None</tra_fro_oth_aut_com>
<can_loa>None</can_loa>
<oth_loa>None</oth_loa>
<tot_loa>None</tot_loa>
<off_to_ope_exp>None</off_to_ope_exp>
<off_to_fun>None</off_to_fun>
<off_to_leg_acc>None</off_to_leg_acc>
<oth_rec>None</oth_rec>
<tot_rec>None</tot_rec>
<ope_exp>None</ope_exp>
<exe_leg_acc_dis>None</exe_leg_acc_dis>
<fun_dis>None</fun_dis>
<tra_to_oth_aut_com>None</tra_to_oth_aut_com>
<can_loa_rep>None</can_loa_rep>
<oth_loa_rep>None</oth_loa_rep>
<tot_loa_rep>None</tot_loa_rep>
<ind_ref>None</ind_ref>
<par_com_ref>None</par_com_ref>
<oth_com_ref>None</oth_com_ref>
<tot_con_ref>None</tot_con_ref>
<oth_dis>None</oth_dis>
<tot_dis>None</tot_dis>
<cas_on_han_beg_of_per>None</cas_on_han_beg_of_per>
<cas_on_han_clo_of_per>None</cas_on_han_clo_of_per>
<net_con>None</net_con>
<net_ope_exp>None</net_ope_exp>
<deb_owe_by_com>None</deb_owe_by_com>
<deb_owe_to_com>None</deb_owe_to_com>
<cov_sta_dat>None</cov_sta_dat>
<cov_end_dat>None</cov_end_dat>

In [66]:
cand_list = [cand for cand in page.xpath("//can_sum") if cand.xpath("can_off")[0].text=="S"]
lin_ima = [cand.xpath("lin_ima")[0].text for cand in cand_list]
len(lin_ima)


Out[66]:
412

Store data into Pandas Data Frame


In [67]:
senate_cadidate = pd.DataFrame(lin_ima, columns=["lin_ima"])
senate_cadidate["can_id"] = [cand.xpath("can_id")[0].text for cand in cand_list]
senate_cadidate["can_nam"] = [cand.xpath("can_nam")[0].text for cand in cand_list]
senate_cadidate["can_off"] = [cand.xpath("can_off")[0].text for cand in cand_list]
senate_cadidate["can_off_sta"] = [cand.xpath("can_off_sta")[0].text for cand in cand_list]
senate_cadidate["can_par_aff"] = [cand.xpath("can_par_aff")[0].text for cand in cand_list]
senate_cadidate["can_inc_cha_ope_sea"] = [cand.xpath("can_inc_cha_ope_sea")[0].text for cand in cand_list]
senate_cadidate["ind_ite_con"] = [cand.xpath("ind_ite_con")[0].text for cand in cand_list]
senate_cadidate["ind_uni_con"] = [cand.xpath("ind_uni_con")[0].text for cand in cand_list]
senate_cadidate["ind_con"] = [cand.xpath("ind_con")[0].text for cand in cand_list]
senate_cadidate["par_com_con"] = [cand.xpath("par_com_con")[0].text for cand in cand_list]
senate_cadidate["oth_com_con"] = [cand.xpath("oth_com_con")[0].text for cand in cand_list]
senate_cadidate["can_con"] = [cand.xpath("can_con")[0].text for cand in cand_list]
senate_cadidate["tot_con"] = [cand.xpath("tot_con")[0].text for cand in cand_list]
senate_cadidate["tra_fro_oth_aut_com"] = [cand.xpath("tra_fro_oth_aut_com")[0].text for cand in cand_list]
senate_cadidate["can_loa"] = [cand.xpath("can_loa")[0].text for cand in cand_list]
senate_cadidate["oth_loa"] = [cand.xpath("oth_loa")[0].text for cand in cand_list]
senate_cadidate["tot_loa"] = [cand.xpath("tot_loa")[0].text for cand in cand_list]
senate_cadidate["off_to_ope_exp"] = [cand.xpath("off_to_ope_exp")[0].text for cand in cand_list]
senate_cadidate["off_to_fun"] = [cand.xpath("off_to_fun")[0].text for cand in cand_list]
senate_cadidate["off_to_leg_acc"] = [cand.xpath("off_to_leg_acc")[0].text for cand in cand_list]
senate_cadidate["oth_rec"] = [cand.xpath("oth_rec")[0].text for cand in cand_list]
senate_cadidate["tot_rec"] = [cand.xpath("tot_rec")[0].text for cand in cand_list]
senate_cadidate["ope_exp"] = [cand.xpath("ope_exp")[0].text for cand in cand_list]
senate_cadidate["fun_dis"] = [cand.xpath("fun_dis")[0].text for cand in cand_list]
senate_cadidate["exe_leg_acc_dis"] = [cand.xpath("exe_leg_acc_dis")[0].text for cand in cand_list]
senate_cadidate["tra_to_oth_aut_com"] = [cand.xpath("tra_to_oth_aut_com")[0].text for cand in cand_list]
senate_cadidate["can_loa_rep"] = [cand.xpath("can_loa_rep")[0].text for cand in cand_list]
senate_cadidate["oth_loa_rep"] = [cand.xpath("oth_loa_rep")[0].text for cand in cand_list]
senate_cadidate["tot_loa_rep"] = [cand.xpath("tot_loa_rep")[0].text for cand in cand_list]
senate_cadidate["ind_ref"] = [cand.xpath("ind_ref")[0].text for cand in cand_list]
senate_cadidate["par_com_ref"] = [cand.xpath("par_com_ref")[0].text for cand in cand_list]
senate_cadidate["oth_com_ref"] = [cand.xpath("oth_com_ref")[0].text for cand in cand_list]
senate_cadidate["tot_con_ref"] = [cand.xpath("tot_con_ref")[0].text for cand in cand_list]
senate_cadidate["oth_dis"] = [cand.xpath("oth_dis")[0].text for cand in cand_list]
senate_cadidate["tot_dis"] = [cand.xpath("tot_dis")[0].text for cand in cand_list]
senate_cadidate["cas_on_han_beg_of_per"] = [cand.xpath("cas_on_han_beg_of_per")[0].text for cand in cand_list]
senate_cadidate["cas_on_han_clo_of_per"] = [cand.xpath("cas_on_han_clo_of_per")[0].text for cand in cand_list]
senate_cadidate["net_con"] = [cand.xpath("net_con")[0].text for cand in cand_list]
senate_cadidate["net_ope_exp"] = [cand.xpath("net_ope_exp")[0].text for cand in cand_list]
senate_cadidate["deb_owe_by_com"] = [cand.xpath("deb_owe_by_com")[0].text for cand in cand_list]
senate_cadidate["deb_owe_to_com"] = [cand.xpath("deb_owe_to_com")[0].text for cand in cand_list]
senate_cadidate["cov_sta_dat"] = [cand.xpath("cov_sta_dat")[0].text for cand in cand_list]
senate_cadidate["cov_end_dat"] = [cand.xpath("cov_end_dat")[0].text for cand in cand_list]
senate_cadidate


Out[67]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 412 entries, 0 to 411
Data columns (total 44 columns):
lin_ima                  412  non-null values
can_id                   412  non-null values
can_nam                  412  non-null values
can_off                  412  non-null values
can_off_sta              412  non-null values
can_par_aff              412  non-null values
can_inc_cha_ope_sea      411  non-null values
ind_ite_con              197  non-null values
ind_uni_con              186  non-null values
ind_con                  204  non-null values
par_com_con              37  non-null values
oth_com_con              116  non-null values
can_con                  101  non-null values
tot_con                  220  non-null values
tra_fro_oth_aut_com      61  non-null values
can_loa                  103  non-null values
oth_loa                  11  non-null values
tot_loa                  102  non-null values
off_to_ope_exp           104  non-null values
off_to_fun               0  non-null values
off_to_leg_acc           0  non-null values
oth_rec                  76  non-null values
tot_rec                  223  non-null values
ope_exp                  221  non-null values
fun_dis                  0  non-null values
exe_leg_acc_dis          0  non-null values
tra_to_oth_aut_com       22  non-null values
can_loa_rep              34  non-null values
oth_loa_rep              4  non-null values
tot_loa_rep              38  non-null values
ind_ref                  117  non-null values
par_com_ref              4  non-null values
oth_com_ref              47  non-null values
tot_con_ref              121  non-null values
oth_dis                  86  non-null values
tot_dis                  221  non-null values
cas_on_han_beg_of_per    69  non-null values
cas_on_han_clo_of_per    192  non-null values
net_con                  215  non-null values
net_ope_exp              217  non-null values
deb_owe_by_com           108  non-null values
deb_owe_to_com           4  non-null values
cov_sta_dat              231  non-null values
cov_end_dat              231  non-null values
dtypes: object(44)

Retrieive YouTube Data for All Candidates


In [69]:
def get_state_data(candidates):
    data_set = get_2014_data(candidates)
    t_ds = pd.pivot_table(data_set, values=["commentCount", "favoriteCount", "dislikeCount", "likeCount", "viewCount"],
                   aggfunc='sum', rows="candidate_name")
    t_ds["like_dislike_r"] = t_ds["likeCount"] / (t_ds["dislikeCount"] + t_ds["likeCount"])
    t_ds["views_share"] = t_ds["viewCount"] / t_ds["viewCount"].sum()
    t_ds["msgs_share"] = t_ds["commentCount"] / t_ds["commentCount"].sum()
    t_ds["likes_share"] = t_ds["likeCount"] / t_ds["likeCount"].sum()
    t_ds["dislikes_share"] = t_ds["dislikeCount"] / t_ds["dislikeCount"].sum()
    print t_ds
    return t_ds

def fix_name(val_name):
    val_names = val_name.split(", ")
    return "%s %s" % (val_names[1].split(" ")[0].capitalize(), val_names[0].capitalize())

In [79]:
values_list = []
for index, state in zip(class_2_senators.index, class_2_senators["state"]):
    print "%s: %s" % (state,
                           class_2_senators["member_full"][index])
    candidates = senate_cadidate[senate_cadidate["can_off_sta"]==state]
    candidates = candidates[~senate_cadidate["tot_rec"].isnull()]
    candidates["tot_rec_num"] = candidates["tot_rec"].apply(lambda x: x[1:].replace(",","")).astype(np.float64)
    top_candidates = candidates.sort("tot_rec_num", ascending=False)[:2][["can_nam",
                                                                      "can_par_aff",
                                                                      "can_inc_cha_ope_sea",
                                                                      "tot_rec_num",
                                                                      "can_off_sta"]]
    top_candidates["full_name"] = [fix_name(name) for name in top_candidates.values[:,0]]
    top_candidates = top_candidates.sort("full_name")
    print top_candidates["full_name"]
    try:
        ds = get_state_data([fix_name(name) for name in top_candidates.values[:,0]])
        ds["state"] = state
        ds["party"] = top_candidates["can_par_aff"].values
        ds["donations"] = top_candidates["tot_rec_num"].values
        values_list.append(ds)
    except:
        print "NA"
        
        
sentate_2014 = pd.concat(values_list)
sentate_2014


AK: Begich (D-AK)
359    Dan Sullivan
27      Mark Begich
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Dan Sullivan             228            90              0        496     189278        0.846416     0.644151   
Mark Begich               65            96              0        157     104563        0.620553     0.355849   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
Dan Sullivan      0.778157     0.759571        0.483871  
Mark Begich       0.221843     0.240429        0.516129  
AL: Sessions (R-AL)
335    Jeff Sessions
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Jeff Sessions            137            10              0         29       4800         0.74359            1   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
Jeff Sessions            1            1               1  
AR: Pryor (D-AR)
290       Mark Pryor
89     Thomas Cotton
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Mark Pryor               139            37              0        175     228610        0.825472     0.263103   
Thomas Cotton            152            73              0        270     640288        0.787172     0.736897   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
Mark Pryor        0.477663     0.393258        0.336364  
Thomas Cotton     0.522337     0.606742        0.663636  
CO: Udall (D-CO)
137    Cory Gardner
375      Mark Udall
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Cory Gardner             327           157              0        467     282673        0.748397     0.659577   
Mark Udall               263           455              0        411     145894        0.474596     0.340423   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
Cory Gardner      0.554237     0.531891        0.256536  
Mark Udall        0.445763     0.468109        0.743464  
DE: Coons (D-DE)
85     Christopher Coons
379           Kevin Wade
Name: full_name, dtype: object
                   commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                    
Christopher Coons           200            20              0        110      20040        0.846154      0.54844   
Kevin Wade                  110            30              0        190      16500        0.863636      0.45156   

                   msgs_share  likes_share  dislikes_share  
candidate_name                                              
Christopher Coons    0.645161     0.366667             0.4  
Kevin Wade           0.354839     0.633333             0.6  
GA: Chambliss (R-GA)
197    John Kingston
261        Mary Nunn
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
John Kingston            113             4              0         60       2597        0.937500     0.727247   
Mary Nunn                 70             8              0         18        974        0.692308     0.272753   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
John Kingston     0.617486     0.769231        0.333333  
Mary Nunn         0.382514     0.230769        0.666667  
IA: Harkin (D-IA)
43     Bruce Braley
180     Mark Jacobs
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Bruce Braley             260            70              0        278      87802        0.798851     0.056535   
Mark Jacobs             7341          1053              0      66086    1465259        0.984316     0.943465   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
Bruce Braley      0.034206     0.004189        0.062333  
Mark Jacobs       0.965794     0.995811        0.937667  
ID: Risch (R-ID)
253    Briane Mitchell
306        James Risch
Name: full_name, dtype: object
NA
IL: Durbin (D-IL)
264    James Oberweis
115    Richard Durbin
Name: full_name, dtype: object
NA
KS: Roberts (R-KS)
403    Milton Wolf
307    Pat Roberts
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Milton Wolf               20            20              0         20       5210        0.500000     0.039943   
Pat Roberts              488            98              0       1000     125227        0.910747     0.960057   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
Milton Wolf        0.03937     0.019608        0.169492  
Pat Roberts        0.96063     0.980392        0.830508  
KY: McConnell (R-KY)
152      Alison Grimes
239    Mitch Mcconnell
Name: full_name, dtype: object
                 commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                  
Alison Grimes            1362           376              0       2797     706090        0.881500     0.432752   
Mitch Mcconnell          2247           291              0       4839     925538        0.943275     0.567248   

                 msgs_share  likes_share  dislikes_share  
candidate_name                                            
Alison Grimes       0.37739     0.366291        0.563718  
Mitch Mcconnell     0.62261     0.633709        0.436282  
LA: Landrieu (D-LA)
206      Mary Landrieu
69     William Cassidy
Name: full_name, dtype: object
                 commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                  
Mary Landrieu             744            79              0       2113     407306        0.963960     0.984916   
William Cassidy            84             1              0         94       6238        0.989474     0.015084   

                 msgs_share  likes_share  dislikes_share  
candidate_name                                            
Mary Landrieu      0.898551     0.957408          0.9875  
William Cassidy    0.101449     0.042592          0.0125  
MA: Markey (D-MA)
231    Edward Markey
146    Gabriel Gomez
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Edward Markey              0            10              0         70       7994        0.875000     0.004272   
Gabriel Gomez            920           515              0       1235    1863296        0.705714     0.995728   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
Edward Markey            0      0.05364        0.019048  
Gabriel Gomez            1      0.94636        0.980952  
ME: Collins (R-ME)
29    Shenna Bellows
80     Susan Collins
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Shenna Bellows            15             0              0          6      57891         1.00000     0.163068   
Susan Collins             90            18              0        351     297121         0.95122     0.836932   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
Shenna Bellows    0.142857     0.016807               0  
Susan Collins     0.857143     0.983193               1  
MI: Levin (D-MI)
280    Gary Peters
205     Terri Land
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Gary Peters               89            45              0        424      37748        0.904051     0.012098   
Terri Land               130           558              0       1784    3082566        0.761742     0.987902   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
Gary Peters       0.406393     0.192029        0.074627  
Terri Land        0.593607     0.807971        0.925373  
MN: Franken (D-MN)
133          Al Franken
242    Michael Mcfadden
Name: full_name, dtype: object
                  commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                   
Al Franken                 186            79              0        235     262735        0.748408     0.894219   
Michael Mcfadden            57            40              0        106      31080        0.726027     0.105781   

                  msgs_share  likes_share  dislikes_share  
candidate_name                                             
Al Franken          0.765432      0.68915        0.663866  
Michael Mcfadden    0.234568      0.31085        0.336134  
MS: Cochran (R-MS)
241    Christopher Mcdaniel
79             Thad Cochran
Name: full_name, dtype: object
                      commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                       
Christopher Mcdaniel             0             0              0          6       1062        1.000000     0.061698   
Thad Cochran                   109            10              0         55      16151        0.846154     0.938302   

                      msgs_share  likes_share  dislikes_share  
candidate_name                                                 
Christopher Mcdaniel           0     0.098361               0  
Thad Cochran                   1     0.901639               1  
MT: Walsh (D-MT)
384       John Walsh
104    Steven Daines
Name: full_name, dtype: object
NA
NC: Hagan (D-NC)
155      Kay Hagan
370    Thom Tillis
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Kay Hagan                278           128              0        271      51704        0.679198     0.056229   
Thom Tillis              231           206              0        561     867825        0.731421     0.943771   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
Kay Hagan         0.546169     0.325721        0.383234  
Thom Tillis       0.453831     0.674279        0.616766  
NE: Johanns (R-NE)
325    Benjamin Sasse
111      Sid Dinsdale
Name: full_name, dtype: object
NA
NH: Shaheen (D-NH)
336    Jeanne Shaheen
50        Scott Brown
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Jeanne Shaheen            30            34              0         84      37123        0.711864     0.085959   
Scott Brown              721            96              0       2465     394746        0.962515     0.914041   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
Jeanne Shaheen    0.039947     0.032954        0.261538  
Scott Brown       0.960053     0.967046        0.738462  
NJ: Booker (D-NJ)
36       Cory Booker
272    Frank Pallone
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Cory Booker              299            43              0        688      93738        0.941176     0.997956   
Frank Pallone              1             0              0          0        192             inf     0.002044   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
Cory Booker       0.996667            1               1  
Frank Pallone     0.003333            0               0  
NM: Udall (D-NM)
391    Allen Weh
376    Tom Udall
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Allen Weh                800            70              0        250    1950780        0.781250     0.987187   
Tom Udall                700            40              0        670      25320        0.943662     0.012813   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
Allen Weh         0.533333     0.271739        0.636364  
Tom Udall         0.466667     0.728261        0.363636  
OK: Inhofe (R-OK)
178      James Inhofe
207    James Lankford
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
James Inhofe            1927           412              0       2116     172284        0.837025     0.985471   
James Lankford            30            10              0         10       2540        0.500000     0.014529   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
James Inhofe       0.98467     0.995296        0.976303  
James Lankford     0.01533     0.004704        0.023697  
OR: Merkley (D-OR)
249    Jeffrey Merkley
392       Monica Wehby
Name: full_name, dtype: object
                 commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                  
Jeffrey Merkley           156            36              0        392      43632        0.915888     0.310716   
Monica Wehby              135           305              0        660      96792        0.683938     0.689284   

                 msgs_share  likes_share  dislikes_share  
candidate_name                                            
Jeffrey Merkley    0.536082     0.372624        0.105572  
Monica Wehby       0.463918     0.627376        0.894428  
RI: Reed (D-RI)
301        Jack Reed
410    Mark Zaccaria
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Jack Reed                 70             9              0        180      41526        0.952381     0.945277   
Mark Zaccaria             10             0              0          0       2404             inf     0.054723   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
Jack Reed            0.875            1               1  
Mark Zaccaria        0.125            0               0  
SC: Graham (R-SC)
148    Lindsey Graham
333     Timothy Scott
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Lindsey Graham          1274           184              0       1555     125933        0.894192     0.877735   
Timothy Scott            198             0              0        332      17542        1.000000     0.122265   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
Lindsey Graham    0.865489     0.824059               1  
Timothy Scott     0.134511     0.175941               0  
SD: Johnson (D-SD)
40     Annette Bosworth
315       Marion Rounds
Name: full_name, dtype: object
NA
TN: Alexander (R-TN)
131       George Flinn
8      Lamar Alexander
Name: full_name, dtype: object
                 commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                  
George Flinn              279            27              0         51       5247        0.653846     0.437505   
Lamar Alexander            37             4              0         49       6746        0.924528     0.562495   

                 msgs_share  likes_share  dislikes_share  
candidate_name                                            
George Flinn       0.882911         0.51        0.870968  
Lamar Alexander    0.117089         0.49        0.129032  
TX: Cornyn (R-TX)
7     David Alameel
88      John Cornyn
Name: full_name, dtype: object
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
David Alameel              3             2              0          2        105        0.500000     0.004871   
John Cornyn              166            20              0        260      21450        0.928571     0.995129   

                msgs_share  likes_share  dislikes_share  
candidate_name                                           
David Alameel     0.017751     0.007634        0.090909  
John Cornyn       0.982249     0.992366        0.909091  
VA: Warner (D-VA)
140    Edward Gillespie
386         Mark Warner
Name: full_name, dtype: object
                  commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                   
Edward Gillespie             2             0              0          8        434        1.000000     0.021218   
Mark Warner                 53             9              0         44      20020        0.830189     0.978782   

                  msgs_share  likes_share  dislikes_share  
candidate_name                                             
Edward Gillespie    0.036364     0.153846               0  
Mark Warner         0.963636     0.846154               1  
WV: Rockefeller (D-WV)
366    Natalie Tennant
61      Shelley Capito
Name: full_name, dtype: object
                 commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                  
Natalie Tennant           350           271              0        830     149271        0.753860     0.945921   
Shelley Capito             48            24              0        152       8534        0.863636     0.054079   

                 msgs_share  likes_share  dislikes_share  
candidate_name                                            
Natalie Tennant    0.879397     0.845214        0.918644  
Shelley Capito     0.120603     0.154786        0.081356  
WY: Enzi (R-WY)
72     Elizabeth Cheney
121        Michael Enzi
Name: full_name, dtype: object
                  commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                   
Elizabeth Cheney            48             0              0         30       4482               1     0.749875   
Michael Enzi                25             0              0         70       1495               1     0.250125   

                  msgs_share  likes_share  dislikes_share  
candidate_name                                             
Elizabeth Cheney    0.657534          0.3             inf  
Michael Enzi        0.342466          0.7             inf  
Out[79]:
<class 'pandas.core.frame.DataFrame'>
Index: 55 entries, Dan Sullivan to Michael Enzi
Data columns (total 13 columns):
commentCount      55  non-null values
dislikeCount      55  non-null values
favoriteCount     55  non-null values
likeCount         55  non-null values
viewCount         55  non-null values
like_dislike_r    55  non-null values
views_share       55  non-null values
msgs_share        55  non-null values
likes_share       55  non-null values
dislikes_share    55  non-null values
state             55  non-null values
party             55  non-null values
donations         55  non-null values
dtypes: float64(6), int64(5), object(2)

In [94]:
class_2_senators["state"]


Out[94]:
4     AK
84    AL
73    AR
91    CO
22    DE
17    GA
38    IA
76    ID
28    IL
77    KS
62    KY
54    LA
59    MA
21    ME
57    MI
33    MN
20    MS
94    MT
37    NC
47    NE
85    NH
8     NJ
92    NM
45    OK
64    OR
74    RI
35    SC
49    SD
0     TN
24    TX
95    VA
78    WV
29    WY
Name: state, dtype: object

In [97]:
x_column = "views_share"
y_column = "viewCount"
s_column = "donations"

color_dict = {"DEM": "b", "REP": "r", "IND":"g", "NPA": "g", "DFL": "g"}
            
plt.figure(figsize=(18,12))

for party in sentate_2014["party"].unique():
    cands = sentate_2014[sentate_2014["party"]==party]
    x = cands[x_column]
    y = cands[y_column]
    size = sentate_2014[sentate_2014["party"]==party][s_column] / 3000000
    plt.scatter(x,y, s=(np.array(size)) * 1000, c=color_dict[party], alpha=0.5)


    
print plt.ylim()[1]
plt.vlines(0.5, ymin=1, ymax=plt.ylim()[1]*0.9)

prejected_winners = sentate_2014[sentate_2014[x_column]>0.5]["party"].value_counts()

result_text = []
for item in sentate_2014.iterrows():#[sentate_2014[x_column]>0.5].iterrows():
    plt.annotate(item[1]["state"], xy=(item[1][x_column], item[1][y_column]))

for item in sentate_2014[sentate_2014[x_column]>0.5].iterrows():
    result_text += ["%s: %s (%s) - %0.1f%%" % (item[1]["state"], item[0], item[1]["party"], item[1]["views_share"] * 100.)]
result_text = "\n".join(result_text)
prejected_winners = "\n".join(["%s:%s" % (party, value) for party, value in zip(prejected_winners.index, prejected_winners.values)])

plt.annotate(prejected_winners, xy=(.65,plt.ylim()[1]*0.8))

plt.annotate(result_text, xy=(.8, 1.5))

plt.xlabel(x_column)
plt.ylabel(y_column + " (Log Scale)")
plt.grid()
plt.yscale("log")
#plt.axis("tight")
plt.title("Senate 2014 Elections Forecast (Size is relative and represents the amount of donations)")
plt.show()


3500000.0

In [58]:
sentate_2014[sentate_2014[x_column]>0.5]


Out[58]:
commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share msgs_share likes_share dislikes_share state party donations
candidate_name
Dan Sullivan 360 94 0 623 165715 0.868898 0.613568 0.814480 0.774876 0.494737 AK DEM 6340422.00
Jeff Sessions 142 10 0 36 5114 0.782609 1.000000 1.000000 1.000000 1.000000 AL REP 1115688.00
Thomas Cotton 100 91 0 195 797919 0.681818 0.814496 0.434783 0.537190 0.722222 AR REP 7097224.06
Cory Gardner 316 167 0 450 234223 0.729335 0.628876 0.633267 0.530660 0.268058 CO DEM 10420571.00
Christopher Coons 190 20 0 110 19950 0.846154 0.559278 0.627063 0.379310 0.400000 DE DEM 4173447.00
John Kingston 37 2 0 17 717 0.894737 0.540724 0.451220 0.447368 1.000000 GA DEM 9211931.00
Mark Jacobs 7334 1052 0 66064 1464016 0.984326 0.943058 0.959571 0.995765 0.932624 IA REP 4810813.00
Pat Roberts 256 83 0 374 97678 0.818381 0.943867 0.733524 0.869767 0.768519 KS REP 1068018.00
Mitch Mcconnell 2197 272 0 4760 893077 0.945946 0.569208 0.632230 0.640387 0.428346 KY DEM 11353760.00
Mary Landrieu 745 84 0 2015 395161 0.959981 0.986888 0.937107 0.971084 1.000000 LA DEM 10190144.00
Gabriel Gomez 899 514 0 1239 1855996 0.706788 0.995771 1.000000 0.946524 0.980916 MA REP 4755654.00
Susan Collins 81 9 0 351 291078 0.975000 0.834797 0.658537 0.983193 1.000000 ME DEM 1333016.00
Terri Land 110 560 0 1660 3077280 0.747748 0.988351 0.578947 0.783019 0.918033 MI DEM 6994603.00
Al Franken 141 71 0 208 260800 0.745520 0.897435 0.762162 0.753623 0.639640 MN DFL 15126268.00
Thad Cochran 74 10 0 35 15199 0.777778 0.955972 1.000000 0.897436 1.000000 MS REP 2727209.00
Thom Tillis 253 194 0 586 797943 0.751282 0.950310 0.496078 0.716381 0.619808 NC REP 4764110.00
Scott Brown 708 92 0 2456 388637 0.963893 0.916956 0.964578 0.967310 0.736000 NH REP 3686708.00
Cory Booker 355 60 0 894 130143 0.937107 0.998527 0.997191 1.000000 1.000000 NJ DEM 16167874.00
Allen Weh 821 71 0 256 1949518 0.782875 0.987909 0.539776 0.279476 0.702970 NM DEM 5050539.00
James Inhofe 1905 406 0 2094 170226 0.837600 0.981288 0.968480 0.984023 0.966667 OK REP 2811701.00
Monica Wehby 115 295 0 660 95280 0.691099 0.702572 0.481172 0.662651 0.891239 OR REP 2049732.00
Jack Reed 90 4 0 128 19412 0.969697 0.910763 0.865385 0.969697 1.000000 RI DEM 2833802.97
Lindsey Graham 1089 135 0 1412 119855 0.912734 0.891586 0.840927 0.811494 1.000000 SC REP 6788544.00
Lamar Alexander 34 4 0 48 6291 0.923077 0.590538 0.161905 0.578313 0.173913 TN REP 1812250.00
John Cornyn 166 20 0 260 21110 0.928571 0.995943 0.982249 0.996169 0.909091 TX DEM 9673572.00
Natalie Tennant 210 110 0 690 159430 0.862500 0.969238 0.860656 0.873418 0.873016 WV REP 5482547.00
Elizabeth Cheney 30 0 0 20 3900 1.000000 0.722892 0.545455 0.222222 inf WY REP 3016825.00

In [52]:



commentCount            360
dislikeCount             94
favoriteCount             0
likeCount               623
viewCount            165715
like_dislike_r    0.8688982
views_share       0.6135684
msgs_share        0.8144796
likes_share       0.7748756
dislikes_share    0.4947368
state                    AK
party                   DEM
donations           6340422
Name: Dan Sullivan, dtype: object
commentCount            142
dislikeCount             10
favoriteCount             0
likeCount                36
viewCount              5114
like_dislike_r    0.7826087
views_share               1
msgs_share                1
likes_share               1
dislikes_share            1
state                    AL
party                   REP
donations           1115688
Name: Jeff Sessions, dtype: object
commentCount            100
dislikeCount             91
favoriteCount             0
likeCount               195
viewCount            797919
like_dislike_r    0.6818182
views_share       0.8144956
msgs_share        0.4347826
likes_share       0.5371901
dislikes_share    0.7222222
state                    AR
party                   REP
donations           7097224
Name: Thomas Cotton, dtype: object
commentCount               316
dislikeCount               167
favoriteCount                0
likeCount                  450
viewCount               234223
like_dislike_r       0.7293355
views_share          0.6288761
msgs_share           0.6332665
likes_share          0.5306604
dislikes_share       0.2680578
state                       CO
party                      DEM
donations         1.042057e+07
Name: Cory Gardner, dtype: object
commentCount            190
dislikeCount             20
favoriteCount             0
likeCount               110
viewCount             19950
like_dislike_r    0.8461538
views_share       0.5592778
msgs_share        0.6270627
likes_share       0.3793103
dislikes_share          0.4
state                    DE
party                   DEM
donations           4173447
Name: Christopher Coons, dtype: object
commentCount             37
dislikeCount              2
favoriteCount             0
likeCount                17
viewCount               717
like_dislike_r    0.8947368
views_share        0.540724
msgs_share        0.4512195
likes_share       0.4473684
dislikes_share            1
state                    GA
party                   DEM
donations           9211931
Name: John Kingston, dtype: object
commentCount           7334
dislikeCount           1052
favoriteCount             0
likeCount             66064
viewCount           1464016
like_dislike_r    0.9843256
views_share       0.9430583
msgs_share        0.9595708
likes_share       0.9957646
dislikes_share    0.9326241
state                    IA
party                   REP
donations           4810813
Name: Mark Jacobs, dtype: object
commentCount            256
dislikeCount             83
favoriteCount             0
likeCount               374
viewCount             97678
like_dislike_r    0.8183807
views_share       0.9438673
msgs_share        0.7335244
likes_share       0.8697674
dislikes_share    0.7685185
state                    KS
party                   REP
donations           1068018
Name: Pat Roberts, dtype: object
commentCount              2197
dislikeCount               272
favoriteCount                0
likeCount                 4760
viewCount               893077
like_dislike_r       0.9459459
views_share          0.5692079
msgs_share           0.6322302
likes_share          0.6403875
dislikes_share       0.4283465
state                       KY
party                      DEM
donations         1.135376e+07
Name: Mitch Mcconnell, dtype: object
commentCount               745
dislikeCount                84
favoriteCount                0
likeCount                 2015
viewCount               395161
like_dislike_r       0.9599809
views_share          0.9868885
msgs_share           0.9371069
likes_share          0.9710843
dislikes_share               1
state                       LA
party                      DEM
donations         1.019014e+07
Name: Mary Landrieu, dtype: object
commentCount            899
dislikeCount            514
favoriteCount             0
likeCount              1239
viewCount           1855996
like_dislike_r    0.7067884
views_share       0.9957712
msgs_share                1
likes_share       0.9465241
dislikes_share     0.980916
state                    MA
party                   REP
donations           4755654
Name: Gabriel Gomez, dtype: object
commentCount             81
dislikeCount              9
favoriteCount             0
likeCount               351
viewCount            291078
like_dislike_r        0.975
views_share       0.8347974
msgs_share        0.6585366
likes_share       0.9831933
dislikes_share            1
state                    ME
party                   DEM
donations           1333016
Name: Susan Collins, dtype: object
commentCount            110
dislikeCount            560
favoriteCount             0
likeCount              1660
viewCount           3077280
like_dislike_r    0.7477477
views_share       0.9883509
msgs_share        0.5789474
likes_share       0.7830189
dislikes_share    0.9180328
state                    MI
party                   DEM
donations           6994603
Name: Terri Land, dtype: object
commentCount               141
dislikeCount                71
favoriteCount                0
likeCount                  208
viewCount               260800
like_dislike_r       0.7455197
views_share           0.897435
msgs_share           0.7621622
likes_share          0.7536232
dislikes_share       0.6396396
state                       MN
party                      DFL
donations         1.512627e+07
Name: Al Franken, dtype: object
commentCount             74
dislikeCount             10
favoriteCount             0
likeCount                35
viewCount             15199
like_dislike_r    0.7777778
views_share       0.9559721
msgs_share                1
likes_share       0.8974359
dislikes_share            1
state                    MS
party                   REP
donations           2727209
Name: Thad Cochran, dtype: object
commentCount            253
dislikeCount            194
favoriteCount             0
likeCount               586
viewCount            797943
like_dislike_r    0.7512821
views_share         0.95031
msgs_share        0.4960784
likes_share       0.7163814
dislikes_share    0.6198083
state                    NC
party                   REP
donations           4764110
Name: Thom Tillis, dtype: object
commentCount            708
dislikeCount             92
favoriteCount             0
likeCount              2456
viewCount            388637
like_dislike_r    0.9638932
views_share       0.9169557
msgs_share        0.9645777
likes_share         0.96731
dislikes_share        0.736
state                    NH
party                   REP
donations           3686708
Name: Scott Brown, dtype: object
commentCount               355
dislikeCount                60
favoriteCount                0
likeCount                  894
viewCount               130143
like_dislike_r       0.9371069
views_share          0.9985269
msgs_share            0.997191
likes_share                  1
dislikes_share               1
state                       NJ
party                      DEM
donations         1.616787e+07
Name: Cory Booker, dtype: object
commentCount            821
dislikeCount             71
favoriteCount             0
likeCount               256
viewCount           1949518
like_dislike_r    0.7828746
views_share       0.9879091
msgs_share        0.5397765
likes_share        0.279476
dislikes_share    0.7029703
state                    NM
party                   DEM
donations           5050539
Name: Allen Weh, dtype: object
commentCount           1905
dislikeCount            406
favoriteCount             0
likeCount              2094
viewCount            170226
like_dislike_r       0.8376
views_share        0.981288
msgs_share        0.9684799
likes_share       0.9840226
dislikes_share    0.9666667
state                    OK
party                   REP
donations           2811701
Name: James Inhofe, dtype: object
commentCount            115
dislikeCount            295
favoriteCount             0
likeCount               660
viewCount             95280
like_dislike_r    0.6910995
views_share        0.702572
msgs_share        0.4811715
likes_share       0.6626506
dislikes_share    0.8912387
state                    OR
party                   REP
donations           2049732
Name: Monica Wehby, dtype: object
commentCount             90
dislikeCount              4
favoriteCount             0
likeCount               128
viewCount             19412
like_dislike_r     0.969697
views_share       0.9107629
msgs_share        0.8653846
likes_share        0.969697
dislikes_share            1
state                    RI
party                   DEM
donations           2833803
Name: Jack Reed, dtype: object
commentCount           1089
dislikeCount            135
favoriteCount             0
likeCount              1412
viewCount            119855
like_dislike_r    0.9127343
views_share       0.8915859
msgs_share        0.8409266
likes_share       0.8114943
dislikes_share            1
state                    SC
party                   REP
donations           6788544
Name: Lindsey Graham, dtype: object
commentCount             34
dislikeCount              4
favoriteCount             0
likeCount                48
viewCount              6291
like_dislike_r    0.9230769
views_share       0.5905379
msgs_share        0.1619048
likes_share       0.5783133
dislikes_share     0.173913
state                    TN
party                   REP
donations           1812250
Name: Lamar Alexander, dtype: object
commentCount            166
dislikeCount             20
favoriteCount             0
likeCount               260
viewCount             21110
like_dislike_r    0.9285714
views_share       0.9959426
msgs_share        0.9822485
likes_share       0.9961686
dislikes_share    0.9090909
state                    TX
party                   DEM
donations           9673572
Name: John Cornyn, dtype: object
commentCount            210
dislikeCount            110
favoriteCount             0
likeCount               690
viewCount            159430
like_dislike_r       0.8625
views_share       0.9692383
msgs_share        0.8606557
likes_share       0.8734177
dislikes_share    0.8730159
state                    WV
party                   REP
donations           5482547
Name: Natalie Tennant, dtype: object
commentCount             30
dislikeCount              0
favoriteCount             0
likeCount                20
viewCount              3900
like_dislike_r            1
views_share       0.7228916
msgs_share        0.5454545
likes_share       0.2222222
dislikes_share          inf
state                    WY
party                   REP
donations           3016825
Name: Elizabeth Cheney, dtype: object

In [46]:
len(sentate_2014["state"].unique())


Out[46]:
27

Check 2012 Senate Elections


In [ ]:


In [35]:
def get_state_data(candidates):
    data_set = get_2012_data(candidates)
    t_ds = pd.pivot_table(data_set, values=["commentCount", "favoriteCount", "dislikeCount", "likeCount", "viewCount"],
                   aggfunc='sum', rows="candidate_name")
    t_ds["like_dislike_r"] = t_ds["likeCount"] / (t_ds["dislikeCount"] + t_ds["likeCount"])
    t_ds["views_share"] = t_ds["viewCount"] / t_ds["viewCount"].sum()
    t_ds["msgs_share"] = t_ds["commentCount"] / t_ds["commentCount"].sum()
    t_ds["likes_share"] = t_ds["likeCount"] / t_ds["likeCount"].sum()
    t_ds["dislikes_share"] = t_ds["dislikeCount"] / t_ds["dislikeCount"].sum()
    # Sentemate Analysis of the title
    t_ds["sentiment"] = pd.Series()
    for cand in candidates:
        t_ds["sentiment"][cand] = np.mean(
                                    [TextBlob(title).polarity for title in data_set[data_set["candidate_name"]==cand]["title"]]
                                         )
    
    print t_ds
    return t_ds

In [36]:
senate_2012 = pd.read_csv("data/2012_senate_results.csv")
senate_2012["Full Name"] = senate_2012["First Name"] + " "  + senate_2012["Last Name"]
senate_2012


Out[36]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 126 entries, 0 to 125
Data columns (total 9 columns):
State Postal    126  non-null values
County Name     126  non-null values
Party           126  non-null values
First Name      126  non-null values
Last Name       126  non-null values
Incumbent       126  non-null values
Vote Count      126  non-null values
Winner          33  non-null values
Full Name       126  non-null values
dtypes: int64(2), object(7)

In [37]:
senate_2012["commentCount"] = pd.Series()
senate_2012["dislikeCount"] = pd.Series()
senate_2012["favoriteCount"] = pd.Series()
senate_2012["likeCount"] = pd.Series()
senate_2012["viewCount"] = pd.Series()
senate_2012["like_dislike_r"] = pd.Series()
senate_2012["views_share"] = pd.Series()
senate_2012["msgs_share"] = pd.Series()
senate_2012["likes_share"] = pd.Series()
senate_2012["dislikes_share"] = pd.Series()
senate_2012["sentiment"] = pd.Series()

for state in np.unique(senate_2012["State Postal"]):
    print state + ":"
    cands = senate_2012[senate_2012["State Postal"] == state]
    top_cands = cands.sort("Vote Count",ascending=False)[:2]
    #print top_cands
    try:
        youtube_stats = get_state_data(top_cands["Full Name"].values)
        #print youtube_stats
        # Store Data Back

        for item in youtube_stats.iterrows():
            cand = item[0]
            stats = item[1]
            index = int(senate_2012[senate_2012["Full Name"] == cand].index)
            senate_2012["commentCount"][index] = stats["commentCount"]
            senate_2012["dislikeCount"][index] = stats["dislikeCount"]
            senate_2012["favoriteCount"][index] = stats["favoriteCount"]
            senate_2012["likeCount"][index] = stats["likeCount"]
            senate_2012["viewCount"][index] = stats["viewCount"]
            senate_2012["like_dislike_r"][index] = stats["like_dislike_r"]
            senate_2012["views_share"][index] = stats["views_share"]
            senate_2012["msgs_share"][index] = stats["msgs_share"]
            senate_2012["likes_share"][index] = stats["likes_share"]
            senate_2012["dislikes_share"][index] = stats["dislikes_share"]
            senate_2012["sentiment"][index] = stats["sentiment"]
    except:
        pass


AZ:
                 commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                  
Jeff Flake                542          1240              0       2927    1829090        0.702424      0.58453   
Richard Carmona           513          2112              0       4590    1300075        0.684870      0.41547   

                 msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                       
Jeff Flake         0.513744     0.389384        0.369928   0.016247  
Richard Carmona    0.486256     0.610616        0.630072   0.044413  
CA:
                  commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                   
Dianne Feinstein         12130          1690              0      16760    3165370        0.908401       0.7182   
Elizabeth Emken           4492           437              0       6777    1241994        0.939423       0.2818   

                  msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                        
Dianne Feinstein    0.729756      0.71207        0.794546  -0.000463  
Elizabeth Emken     0.270244      0.28793        0.205454  -0.006897  
CT:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Chris Murphy             233            88              0        447     159171        0.835514     0.371103   
Linda McMahon           3962           418              0       4062     269742        0.906696     0.628897   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Chris Murphy      0.055542     0.099135        0.173913   0.004893  
Linda McMahon     0.944458     0.900865        0.826087   0.029649  
DE:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Kevin Wade                60            10              0         70      11400           0.875     0.276123   
Thomas Carper             28            10              0         70      29886           0.875     0.723877   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Kevin Wade        0.681818          0.5             0.5  -0.031250  
Thomas Carper     0.318182          0.5             0.5   0.018583  
FL:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Bill Nelson              376            59              0        937     511466        0.940763     0.865892   
Connie Mack              164            67              0        249      79215        0.787975     0.134108   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Bill Nelson       0.696296     0.790051        0.468254   0.021819  
Connie Mack       0.303704     0.209949        0.531746   0.015071  
HI:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Linda Lingle             343           451              0        672     480724        0.598397     0.476687   
Mazie Hirono             367           577              0        924     527744        0.615590     0.523313   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Linda Lingle      0.483099     0.421053        0.438716   0.023237  
Mazie Hirono      0.516901     0.578947        0.561284   0.065901  
IN:
                  commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                   
Joe Donnelly              3011           931              0       1203    2646352        0.563730     0.756489   
Richard Mourdock          7869          1467              0       8777     851850        0.856794     0.243511   

                  msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                        
Joe Donnelly        0.276746     0.120541         0.38824  -0.009662  
Richard Mourdock    0.723254     0.879459         0.61176   0.043522  
MA:
                  commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                   
Elizabeth Warren         10007          2208              0      15744    2458138        0.877005     0.718429   
Scott Brown               5226          1259              0       6490     963410        0.837527     0.281571   

                  msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                        
Elizabeth Warren    0.656929     0.708105        0.636862   0.014748  
Scott Brown         0.343071     0.291895        0.363138   0.019217  
MD:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Ben Cardin               301            90              0        386      61822        0.810924      0.82694   
Daniel Bongino           162            10              0        162      12938        0.941860      0.17306   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Ben Cardin        0.650108      0.70438             0.9   0.008083  
Daniel Bongino    0.349892      0.29562             0.1  -0.018750  
ME:
                 commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                  
Angus King                 91           133              0        385      39436        0.743243     0.911689   
Charles Summers             0            10              0         40       3820        0.800000     0.088311   

                 msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                       
Angus King                1     0.905882         0.93007   0.053315  
Charles Summers           0     0.094118         0.06993   0.016667  
MI:
                 commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                  
Debbie Stabenow            63            83              0        595     428223        0.877581     0.870664   
Pete Hoekstra              57           187              0        398      63612        0.680342     0.129336   

                 msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                       
Debbie Stabenow       0.525     0.599194        0.307407   0.062909  
Pete Hoekstra         0.475     0.400806        0.692593   0.121008  
MN:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Amy Klobuchar            264           242              0        430      97354        0.639881     0.659651   
Kurt Bills               300           100              0        460      50230        0.821429     0.340349   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Amy Klobuchar     0.468085     0.483146        0.707602   0.030468  
Kurt Bills        0.531915     0.516854        0.292398   0.160000  
MO:
                  commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                   
Claire McCaskill           525           438              0        765     141427        0.635910     0.036271   
Todd Akin                44285          5817              0      66830    3757741        0.919928     0.963729   

                  msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                        
Claire McCaskill    0.011716     0.011317        0.070024   0.009500  
Todd Akin           0.988284     0.988683        0.929976  -0.016794  
MS:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Albert Gore              370             0              0        260      75470        1.000000      0.70684   
Roger Wicker             120            93              0        203      31301        0.685811      0.29316   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Albert Gore       0.755102     0.561555               0  -0.050000  
Roger Wicker      0.244898     0.438445               1   0.029821  
MT:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Denny Rehberg           4500           351              0      10651     390033        0.968097     0.807661   
Jon Tester               480            72              0       1172      92884        0.942122     0.192339   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Denny Rehberg     0.903614     0.900871        0.829787   0.003883  
Jon Tester        0.096386     0.099129        0.170213   0.023900  
ND:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Heidi Heitkamp           315           289              0        726     773197        0.715271     0.574998   
Rick Berg                331           200              0        573     571499        0.741268     0.425002   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Heidi Heitkamp    0.487616     0.558891        0.591002   0.007323  
Rick Berg         0.512384     0.441109        0.408998   0.033317  
NE:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Bob Kerrey               328           179              0       1526     526859        0.895015     0.616558   
Deb Fischer              441           272              0       2076     327657        0.884157     0.383442   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Bob Kerrey        0.426528     0.423654        0.396896   0.015867  
Deb Fischer       0.573472     0.576346        0.603104   0.011278  
NJ:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Bob Menendez             601           232              0        819     132137        0.779258     0.777057   
Joe Kyrillos              87           189              0        198      37911        0.511628     0.222943   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Bob Menendez      0.873547      0.80531        0.551069  -0.017692  
Joe Kyrillos      0.126453      0.19469        0.448931   0.083939  
NM:
                 commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                  
Heather Wilson            170           330              0        460     112560        0.582278     0.116505   
Martin Heinrich           380           790              0       2580     853580        0.765579     0.883495   

                 msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                       
Heather Wilson     0.309091     0.151316        0.294643   0.064444  
Martin Heinrich    0.690909     0.848684        0.705357   0.058437  
NV:
                 commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                  
Dean Heller               286           748              0       1077    1014221        0.590137     0.429432   
Shelley Berkley           434           402              0        884    1347552        0.687403     0.570568   

                 msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                       
Dean Heller        0.397222      0.54921        0.650435  -0.003463  
Shelley Berkley    0.602778      0.45079        0.349565   0.005413  
NY:
                    commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                     
Kirsten Gillibrand           684           167              0       1452     399903        0.896850     0.638157   
Wendy Long                   494            83              0        740     226750        0.899149     0.361843   

                    msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                          
Kirsten Gillibrand    0.580645     0.662409           0.668   0.030444  
Wendy Long            0.419355     0.337591           0.332   0.074461  
OH:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Josh Mandel              484           311              0        423     163496        0.576294     0.317986   
Sherrod Brown            292            70              0        331     350665        0.825436     0.682014   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Josh Mandel       0.623711     0.561008        0.816273   0.002392  
Sherrod Brown     0.376289     0.438992        0.183727   0.034265  
PA:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Bob Casey                117           139              0        316      57281        0.694505     0.203635   
Tom Smith                793           209              0       1648     224012        0.887453     0.796365   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Bob Casey         0.128571     0.160896        0.399425   0.014105  
Tom Smith         0.871429     0.839104        0.600575   0.048067  
RI:
                    commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                     
Barry Hinckley                12            12              0        138      21111        0.920000     0.404712   
Sheldon Whitehouse            10            10              0        134      31052        0.930556     0.595288   

                    msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                          
Barry Hinckley        0.545455     0.507353        0.545455   0.138462  
Sheldon Whitehouse    0.454545     0.492647        0.454545   0.033625  
TN:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Bob Corker               155            20              0        162      17499        0.890110     0.258238   
Mark Clayton             230            50              0        230      50264        0.821429     0.741762   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Bob Corker        0.402597     0.413265        0.285714   0.144643  
Mark Clayton      0.597403     0.586735        0.714286   0.100538  
TX:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Paul Sadler              840           257              0       1937     250148        0.882862     0.462666   
Ted Cruz                1676           255              0       2410     290519        0.904315     0.537334   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Paul Sadler       0.333863     0.445595        0.501953   0.015821  
Ted Cruz          0.666137     0.554405        0.498047   0.009659  
UT:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Orrin Hatch              495           336              0        284      38701        0.458065     0.511722   
Scott Howell              19            13              0        175      36928        0.930851     0.488278   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Orrin Hatch       0.963035     0.618736        0.962751   0.084909  
Scott Howell      0.036965     0.381264        0.037249   0.050492  
VA:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
George Allen             324           443              0        500     206940        0.530223     0.718385   
Timothy Kaine            174            90              0        272      81123        0.751381     0.281615   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
George Allen      0.650602     0.647668        0.831144   0.023067  
Timothy Kaine     0.349398     0.352332        0.168856   0.022738  
VT:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Bernie Sanders          2131           129              0       4878     223002        0.974236     0.992571   
John MacGovern             6             0              0          2       1669        1.000000     0.007429   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Bernie Sanders    0.997192      0.99959               1   0.011692  
John MacGovern    0.002808      0.00041               0   0.000000  
WA:
                     commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                      
Maria Cantwell                234            52              0        244      62934        0.824324     0.736725   
Michael Baumgartner           112            70              0        137      22490        0.661836     0.263275   

                     msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                           
Maria Cantwell         0.676301      0.64042         0.42623   0.107853  
Michael Baumgartner    0.323699      0.35958         0.57377   0.031234  
WI:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Tammy Baldwin            393           169              0        944     488400        0.848158     0.602401   
Tommy Thompson          1494           550              0       2332     322355        0.809160     0.397599   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Tammy Baldwin     0.208267     0.288156        0.235049   0.022898  
Tommy Thompson    0.791733     0.711844        0.764951   0.010391  
WV:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
Joe Manchin              675           545              0       3005     604285        0.846479     0.964792   
John Raese                74            68              0        118      22052        0.634409     0.035208   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
Joe Manchin       0.901202     0.962216         0.88907   0.006897  
John Raese        0.098798     0.037784         0.11093  -0.057143  
WY:
                commentCount  dislikeCount  favoriteCount  likeCount  viewCount  like_dislike_r  views_share  \
candidate_name                                                                                                 
John Barrasso            630           290              0        380      74030        0.567164     0.996943   
Tim Chesnut                4             2              0          2        227        0.500000     0.003057   

                msgs_share  likes_share  dislikes_share  sentiment  
candidate_name                                                      
John Barrasso     0.993691     0.994764        0.993151       0.09  
Tim Chesnut       0.006309     0.005236        0.006849       0.00  

In [38]:
cands_with_stats = senate_2012[~senate_2012["viewCount"].isnull()]
cands_with_stats["VotesShare"] = cands_with_stats[["Vote Count", "State Postal"]].apply(lambda x:x[0]/senate_2012[senate_2012["State Postal"]==x[1]]["Vote Count"].sum(), axis=1)

In [39]:
x_col = "views_share"
y_col = "VotesShare"

plt.figure(figsize=(15,10))
color_dict = {"Dem": "b", "GOP": "r", "Ind":"g", "NPA": "orange"}
shape_dict = {"X": "*", "nan": "."}

wl_dp = [len(cands_with_stats[(cands_with_stats[x_col]>=0.5) &
                          (cands_with_stats["Winner"]=="X")]),
                    len(cands_with_stats[(cands_with_stats[x_col]>=0.5)])]
wl_dm = [len(cands_with_stats[(cands_with_stats[x_col]<0.5) &
                          (cands_with_stats["Winner"]=="X")]),
                    len(cands_with_stats[(cands_with_stats[x_col]<0.5)])]

wl_50p = "Winning Ratio %s/%s ($%0.1f \%%$)" % (wl_dp[0], wl_dp[1], wl_dp[0]/wl_dp[1]*100)
wl_50m = "Winning Ratio %s/%s ($%0.1f \%%$)" % (wl_dm[0], wl_dm[1], wl_dm[0]/wl_dm[1]*100)

for cand in cands_with_stats.iterrows():
    stats = cand[1]
    x = stats[x_col]
    y = stats[y_col]
    c = color_dict[stats["Party"]]
    m = shape_dict[str(stats["Winner"])]
    plt.scatter(x, y, c=c, marker=m, s=500, alpha=0.5)
    if stats[x_col] > 0.9:
        plt.annotate(stats["Full Name"],xytext=(8,20), xy=(x,y),
                     textcoords='offset points', arrowprops=dict(arrowstyle='-|>'))

plt.xlabel("Youtube " + x_col + " Between Competing Candidates in a State Race")
plt.ylabel("Actual " + y_col)
plt.vlines(.5, ymin=0, ymax=1)

plt.annotate(s=wl_50p, xy=(0.7, 1))
plt.annotate(s=wl_50m, xy=(0.2, 1))
plt.title("Youtube Video Views for Candidate from 2012-08-04 to 2012-11-04 and Actual Votes")
plt.annotate("Start Represent Winning Candidates\nCircles Represent Loosing Candidate", xy=(0.03, 0.85))
plt.annotate("Red: GOP\nBlue: Dem\nGreen: Ind\nYellow: NPA", xy=(0.03, 0.7))
axis("tight")
plt.box(on="off")
plt.show()



In [40]:
cands_with_stats[cands_with_stats["State Postal"]=="MO"]


Out[40]:
State Postal County Name Party First Name Last Name Incumbent Vote Count Winner Full Name commentCount dislikeCount favoriteCount likeCount viewCount like_dislike_r views_share msgs_share likes_share dislikes_share sentiment VotesShare
13 MO Missouri Dem Claire McCaskill 1 1484683 X Claire McCaskill 525 438 0 765 141427 0.635910 0.036271 0.011716 0.011317 0.070024 0.009500 0.547173
46 MO Missouri GOP Todd Akin 0 1063698 NaN Todd Akin 44285 5817 0 66830 3757741 0.919928 0.963729 0.988284 0.988683 0.929976 -0.016794 0.392021